NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Integrating diverse corpora for training an endangered language machine translation system

Scheppat, Hunter; Hartshorne, Joshua; Leddy, Dylan; Le_Ferrand, Eric; Prud'hommeaux, Emily (March 2025, Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL))

Free, publicly-accessible full text available March 1, 2026
Studying the impact of language model size for low-resource ASR

Liu, Zoey; Spence, Justin; Prud'hommeaux, Emily (January 2023, Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-6))

Full Text Available
Combining Simple but Novel Data Augmentation Methods for Improving Conformer ASR

https://doi.org/10.21437/Interspeech.2022-10835

Damania, Ronit; Homan, Christopher; Prud'hommeaux, Emily (January 2022, Interspeech 2022)

Full Text Available
Enhancing Documentation of Hupa with Automatic Speech Recognition

https://doi.org/10.18653/v1/2022.computel-1.23

Liu, Zoey; Spence, Justin; Prud'hommeaux, Emily (January 2022, Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages)

Full Text Available
Automatic Speech Recognition for Supporting Endangered Language Documentation

Prud'hommeaux, Emily; Jimerson, Robbie; Hatcher, Richard; Michelson, Karin (January 2021, Language documentation and conservation)

Full Text Available
Automatic Speech Recognition for Supporting Endangered Language Documentation

Prud'hommeaux, Emily; Jimerson, Robbie; Hatcher, Richard; Michelson, Karin (January 2021, Language documentation and conservation)

Full Text Available
Synthetic Data Augmentation for Improving Low-Resource ASR

https://doi.org/10.1109/WNYIPW.2019.8923082

Thai, Bao; Jimerson, Robert; Arcoraci, Dominic; Prud'hommeaux, Emily; Ptucha, Raymond (October 2019, 2019 IEEE Western New York Image and Signal Processing Workshop (WNYISPW))

Although the application of deep learning to automatic speech recognition (ASR) has resulted in dramatic reductions in word error rate for languages with abundant training data, ASR for languages with few resources has yet to benefit from deep learning to the same extent. In this paper, we investigate various methods of acoustic modeling and data augmentation with the goal of improving the accuracy of a deep learning ASR framework for a low-resource language with a high baseline word error rate. We compare several methods of generating synthetic acoustic training data via voice transformation and signal distortion, and we explore several strategies for integrating this data into the acoustic training pipeline. We evaluate our methods on an indigenous language of North America with minimal training resources. We show that training initially via transfer learning from an existing high-resource language acoustic model, refining weights using a heavily concentrated synthetic dataset, and finally fine-tuning to the target language using limited synthetic data reduces WER by 15% over just transfer learning using deep recurrent methods. Further, we show improvements over traditional frameworks by 19% using a similar multistage training with deep convolutional approaches.
more » « less
Full Text Available
Pragmatic Characteristics of Security Conversations: An Exploratory Linguistic Analysis

https://doi.org/10.1109/CHASE.2019.00026

Meyers, Benjamin S.; Munaiah, Nuthan; Meneely, Andrew; Prud'hommeaux, Emily (May 2019, International Conference on Software Engineering, CHASE Workshop)

Full Text Available
Improving ASR Output for Endangered Language Documentation

https://doi.org/10.21437/SLTU.2018-39

Jimerson, Robert; Simha, Kruthika; Ptucha, Ray; Prud'hommeaux, Emily (August 2018, The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages)

Documenting endangered languages supports the historical preservation of diverse cultures. Automatic speech recognition (ASR), while potentially very useful for this task, has been underutilized for language documentation due to the challenges inherent in building robust models from extremely limited audio and text training resources. In this paper, we explore the utility of supplementing existing training resources using synthetic data, with a focus on Seneca, a morphologically complex endangered language of North America. We use transfer learning to train acoustic models using both the small amount of available acoustic training data and artificially distorted copies of that data. We then supplement the language model training data with verb forms generated by rule and sentences produced by an LSTM trained on the available text data. The addition of synthetic data yields reductions in word error rate, demonstrating the promise of data augmentation for this task.
more » « less
Full Text Available
Multimodal Alignment for Affective Content

Haduong, Nikita; Nester, David; Vaidyanathan, Preethi; Prud'hommeaux, Emily; Bailey, Reynold; Alm, Cecilia (January 2018, AAAI Workshop on Affective Content Analysis)

Humans routinely extract important information from images and videos, relying on their gaze. In contrast, computational systems still have difficulty annotating important visual information in a human-like manner, in part because human gaze is often not included in the modeling process. Human input is also particularly relevant for processing and interpreting affective visual information. To address this challenge, we captured human gaze, spoken language, and facial expressions simultaneously in an experiment with visual stimuli characterized by subjective and affective content. Observers described the content of complex emotional images and videos depicting positive and negative scenarios and also their feelings about the imagery being viewed. We explore patterns of these modalities, for example by comparing the affective nature of participant-elicited linguistic tokens with image valence. Additionally, we expand a framework for generating automatic alignments between the gaze and spoken language modalities for visual annotation of images. Multimodal alignment is challenging due to their varying temporal offset. We explore alignment robustness when images have affective content and whether image valence influences alignment results. We also study if word frequency-based filtering impacts results, with both the unfiltered and filtered scenarios performing better than baseline comparisons, and with filtering resulting in a substantial decrease in alignment error rate. We provide visualizations of the resulting annotations from multimodal alignment. This work has implications for areas such as image understanding, media accessibility, and multimodal data fusion.
more » « less
Full Text Available

« Prev Next »

Search for: All records